iT邦幫忙

0

Python學習筆記: Pandas MultiIndex 快速建立資料

  • 分享至 

  • xImage
  •  

本文同步發表於小弟自架網站:微確幸資訊站

資料建立目標:

學年:108, 109, 110
學期:1, 2
學制:"博士班", "碩士班", "大學部"
性別:"女", "男"

以上的欄位建構後,呈現「學生人數」。
資料筆數總共為len("學年")*len("學期")*len("學制")*len("性別"),計算後為36筆。

以下為一般冗長方式建構:

import pandas as pd
import numpy as np

arrays = [
    ["108", "108", "108", "108", "108", "108", "108", "108", "108", "108", "108", "108",
     "109", "109", "109", "109", "109", "109", "109", "109", "109", "109", "109", "109",
     "110", "110", "110", "110", "110", "110", "110", "110", "110", "110", "110", "110"],
    ["1", "1", "1", "1", "1", "1", "2", "2", "2", "2", "2", "2",
     "1", "1", "1", "1", "1", "1", "2", "2", "2", "2", "2", "2",
     "1", "1", "1", "1", "1", "1", "2", "2", "2", "2", "2", "2",],
    ["博士班", "博士班", "碩士班", "碩士班", "大學部", "大學部", "博士班", "博士班", "碩士班", "碩士班", "大學部", "大學部",
     "博士班", "博士班", "碩士班", "碩士班", "大學部", "大學部", "博士班", "博士班", "碩士班", "碩士班", "大學部", "大學部",
     "博士班", "博士班", "碩士班", "碩士班", "大學部", "大學部", "博士班", "博士班", "碩士班", "碩士班", "大學部", "大學部"],
    ["女", "男", "女", "男", "女", "男", "女", "男", "女", "男", "女", "男", 
     "女", "男", "女", "男", "女", "男", "女", "男", "女", "男", "女", "男", 
     "女", "男", "女", "男", "女", "男", "女", "男", "女", "男", "女", "男"],
]
tuples = list(zip(*arrays))
tuples

https://ithelp.ithome.com.tw/upload/images/20221209/201223351D8CWhTjMG.jpg

index = pd.MultiIndex.from_tuples(tuples, names=["學年", "學期", "學制", "性別"])
df = pd.DataFrame(np.random.randint(300,size=(36,1)),index=index,columns=["學生人數"])
df

輸出結果太長,只截圖一部份:
https://ithelp.ithome.com.tw/upload/images/20221209/20122335aC1BQ4lRSy.jpg

以下為快速方式建構:

year = [108, 109, 110]
semester = [1, 2]
academic = ['博士班', '碩士班', '大學部']
gender = ['男', '女']
index = pd.MultiIndex.from_product([year, semester, academic, gender],
                           names=['學年', '學期', '學制', '性別'])
index

輸出結果太長,只截圖一部份:
https://ithelp.ithome.com.tw/upload/images/20221209/20122335ZUJt05bvWi.jpg

df = pd.DataFrame(np.random.randint(300,size=(36,1)),index=index,columns=["學生人數"])
df

輸出結果太長,只截圖一部份:
https://ithelp.ithome.com.tw/upload/images/20221209/20122335Lkx3ojazzW.jpg

上面的方式就能快速得到所需建構的資料。

但MultiIndex的資料其實在讀取及處理上不是很直覺,
可以加個程式碼將MultiIndex取代掉,看起來就很像Excel的格式。

後續處理

df = df.reset_index()
df

輸出結果太長,只截圖一部份:
https://ithelp.ithome.com.tw/upload/images/20221209/201223354UDgeCN6Ut.jpg


圖片
  直播研討會
圖片
{{ item.channelVendor }} {{ item.webinarstarted }} |
{{ formatDate(item.duration) }}
直播中

尚未有邦友留言

立即登入留言